Overview

Dataset statistics

Number of variables17
Number of observations2966
Missing cells443
Missing cells (%)0.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory394.0 KiB
Average record size in memory136.0 B

Variable types

Numeric9
Categorical8

Warnings

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
glucose is highly correlated with diabetesHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
diaBP is highly correlated with sysBP and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
education has 70 (2.4%) missing values Missing
BPMeds has 37 (1.2%) missing values Missing
totChol has 37 (1.2%) missing values Missing
glucose has 277 (9.3%) missing values Missing
Unnamed: 0 has unique values Unique
cigsPerDay has 1499 (50.5%) zeros Zeros

Reproduction

Analysis started2021-07-29 20:09:40.911070
Analysis finished2021-07-29 20:09:51.955294
Duration11.04 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct2966
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2112.498314
Minimum1
Maximum4236
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:52.063418image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile205.25
Q11042.25
median2109
Q33165.75
95-th percentile4030.75
Maximum4236
Range4235
Interquartile range (IQR)2123.5

Descriptive statistics

Standard deviation1230.560196
Coefficient of variation (CV)0.5825141669
Kurtosis-1.207104672
Mean2112.498314
Median Absolute Deviation (MAD)1062.5
Skewness0.003230074103
Sum6265670
Variance1514278.395
MonotonicityStrictly increasing
2021-07-29T22:09:52.178993image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21
 
< 0.1%
6691
 
< 0.1%
6571
 
< 0.1%
27061
 
< 0.1%
6591
 
< 0.1%
6611
 
< 0.1%
27101
 
< 0.1%
6631
 
< 0.1%
27121
 
< 0.1%
27141
 
< 0.1%
Other values (2956)2956
99.7%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
71
< 0.1%
81
< 0.1%
101
< 0.1%
111
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
42361
< 0.1%
42351
< 0.1%
42341
< 0.1%
42321
< 0.1%
42301
< 0.1%
42281
< 0.1%
42271
< 0.1%
42261
< 0.1%
42251
< 0.1%
42241
< 0.1%

sex
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
1664 
1
1302 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01664
56.1%
11302
43.9%

Length

2021-07-29T22:09:52.361177image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:52.415966image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01664
56.1%
11302
43.9%

Most occurring characters

ValueCountFrequency (%)
01664
56.1%
11302
43.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01664
56.1%
11302
43.9%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01664
56.1%
11302
43.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01664
56.1%
11302
43.9%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.52629804
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:52.493962image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.530778493
Coefficient of variation (CV)0.1722474489
Kurtosis-0.9748599779
Mean49.52629804
Median Absolute Deviation (MAD)7
Skewness0.2387323172
Sum146895
Variance72.7741817
MonotonicityNot monotonic
2021-07-29T22:09:52.782038image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
42134
 
4.5%
40131
 
4.4%
41128
 
4.3%
48127
 
4.3%
46122
 
4.1%
43120
 
4.0%
39119
 
4.0%
45117
 
3.9%
44109
 
3.7%
52109
 
3.7%
Other values (29)1750
59.0%
ValueCountFrequency (%)
321
 
< 0.1%
334
 
0.1%
3412
 
0.4%
3527
 
0.9%
3663
2.1%
3763
2.1%
3892
3.1%
39119
4.0%
40131
4.4%
41128
4.3%
ValueCountFrequency (%)
701
 
< 0.1%
694
 
0.1%
6814
 
0.5%
6734
 
1.1%
6623
 
0.8%
6541
1.4%
6458
2.0%
6376
2.6%
6261
2.1%
6185
2.9%

education
Categorical

MISSING

Distinct4
Distinct (%)0.1%
Missing70
Missing (%)2.4%
Memory size23.3 KiB
1.0
1186 
2.0
897 
3.0
483 
4.0
330 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8688
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row1.0
3rd row3.0
4th row3.0
5th row2.0

Common Values

ValueCountFrequency (%)
1.01186
40.0%
2.0897
30.2%
3.0483
16.3%
4.0330
 
11.1%
(Missing)70
 
2.4%

Length

2021-07-29T22:09:52.973320image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:53.031646image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.01186
41.0%
2.0897
31.0%
3.0483
16.7%
4.0330
 
11.4%

Most occurring characters

ValueCountFrequency (%)
.2896
33.3%
02896
33.3%
11186
13.7%
2897
 
10.3%
3483
 
5.6%
4330
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5792
66.7%
Other Punctuation2896
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02896
50.0%
11186
20.5%
2897
 
15.5%
3483
 
8.3%
4330
 
5.7%
Other Punctuation
ValueCountFrequency (%)
.2896
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8688
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.2896
33.3%
02896
33.3%
11186
13.7%
2897
 
10.3%
3483
 
5.6%
4330
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII8688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.2896
33.3%
02896
33.3%
11186
13.7%
2897
 
10.3%
3483
 
5.6%
4330
 
3.8%

currentSmoker
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
1499 
1
1467 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
01499
50.5%
11467
49.5%

Length

2021-07-29T22:09:53.194014image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:53.248184image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01499
50.5%
11467
49.5%

Most occurring characters

ValueCountFrequency (%)
01499
50.5%
11467
49.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01499
50.5%
11467
49.5%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01499
50.5%
11467
49.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01499
50.5%
11467
49.5%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct32
Distinct (%)1.1%
Missing21
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean9.026825127
Minimum0
Maximum70
Zeros1499
Zeros (%)50.5%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:53.322606image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.99018622
Coefficient of variation (CV)1.32828387
Kurtosis1.118487785
Mean9.026825127
Median Absolute Deviation (MAD)0
Skewness1.267415023
Sum26584
Variance143.7645655
MonotonicityNot monotonic
2021-07-29T22:09:53.434743image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
01499
50.5%
20508
 
17.1%
30164
 
5.5%
15137
 
4.6%
10101
 
3.4%
996
 
3.2%
593
 
3.1%
373
 
2.5%
4054
 
1.8%
145
 
1.5%
Other values (22)175
 
5.9%
ValueCountFrequency (%)
01499
50.5%
145
 
1.5%
210
 
0.3%
373
 
2.5%
46
 
0.2%
593
 
3.1%
612
 
0.4%
78
 
0.3%
85
 
0.2%
996
 
3.2%
ValueCountFrequency (%)
701
 
< 0.1%
609
 
0.3%
504
 
0.1%
452
 
0.1%
4339
 
1.3%
4054
 
1.8%
3516
 
0.5%
30164
5.5%
291
 
< 0.1%
2533
 
1.1%

BPMeds
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing37
Missing (%)1.2%
Memory size23.3 KiB
0.0
2844 
1.0
 
85

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8787
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02844
95.9%
1.085
 
2.9%
(Missing)37
 
1.2%

Length

2021-07-29T22:09:53.615450image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:53.669183image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.02844
97.1%
1.085
 
2.9%

Most occurring characters

ValueCountFrequency (%)
05773
65.7%
.2929
33.3%
185
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5858
66.7%
Other Punctuation2929
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05773
98.5%
185
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.2929
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8787
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
05773
65.7%
.2929
33.3%
185
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8787
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
05773
65.7%
.2929
33.3%
185
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
2952 
1
 
14

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

Length

2021-07-29T22:09:53.825139image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:53.879242image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

Most occurring characters

ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02952
99.5%
114
 
0.5%

prevalentHyp
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
2026 
1
940 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
02026
68.3%
1940
31.7%

Length

2021-07-29T22:09:54.036727image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:54.090623image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02026
68.3%
1940
31.7%

Most occurring characters

ValueCountFrequency (%)
02026
68.3%
1940
31.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02026
68.3%
1940
31.7%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02026
68.3%
1940
31.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02026
68.3%
1940
31.7%

diabetes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
2887 
1
 
79

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

Length

2021-07-29T22:09:54.246234image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:54.300428image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

Most occurring characters

ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02887
97.3%
179
 
2.7%

totChol
Real number (ℝ≥0)

MISSING

Distinct231
Distinct (%)7.9%
Missing37
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean237.3898942
Minimum119
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:54.383541image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum119
5-th percentile171
Q1206
median234
Q3264
95-th percentile313
Maximum696
Range577
Interquartile range (IQR)58

Descriptive statistics

Standard deviation44.71013481
Coefficient of variation (CV)0.1883405145
Kurtosis5.471135846
Mean237.3898942
Median Absolute Deviation (MAD)29
Skewness1.011718183
Sum695315
Variance1998.996155
MonotonicityNot monotonic
2021-07-29T22:09:54.505158image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24060
 
2.0%
21051
 
1.7%
26046
 
1.6%
23246
 
1.6%
22045
 
1.5%
20043
 
1.4%
23039
 
1.3%
22539
 
1.3%
24537
 
1.2%
25037
 
1.2%
Other values (221)2486
83.8%
ValueCountFrequency (%)
1191
 
< 0.1%
1261
 
< 0.1%
1351
 
< 0.1%
1371
 
< 0.1%
1402
 
0.1%
1433
0.1%
1441
 
< 0.1%
1451
 
< 0.1%
1492
 
0.1%
1507
0.2%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%
3912
0.1%
3901
 
< 0.1%

sysBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct221
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.6153068
Minimum85
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:54.622252image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum85
5-th percentile104.125
Q1117
median128
Q3144
95-th percentile175.875
Maximum295
Range210
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.31195027
Coefficient of variation (CV)0.1682456634
Kurtosis2.393122758
Mean132.6153068
Median Absolute Deviation (MAD)13
Skewness1.198365464
Sum393337
Variance497.8231248
MonotonicityNot monotonic
2021-07-29T22:09:54.738536image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12072
 
2.4%
12567
 
2.3%
13065
 
2.2%
12263
 
2.1%
11563
 
2.1%
11061
 
2.1%
12460
 
2.0%
12657
 
1.9%
12351
 
1.7%
11650
 
1.7%
Other values (211)2357
79.5%
ValueCountFrequency (%)
851
 
< 0.1%
85.51
 
< 0.1%
901
 
< 0.1%
921
 
< 0.1%
92.51
 
< 0.1%
931
 
< 0.1%
93.52
 
0.1%
941
 
< 0.1%
956
0.2%
95.52
 
0.1%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2202
0.1%
2171
 
< 0.1%
2153
0.1%
2132
0.1%
2104
0.1%

diaBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct140
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83.14649359
Minimum48
Maximum136
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:54.852970image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q390
95-th percentile105
Maximum136
Range88
Interquartile range (IQR)15

Descriptive statistics

Standard deviation12.04392745
Coefficient of variation (CV)0.1448518985
Kurtosis1.221638741
Mean83.14649359
Median Absolute Deviation (MAD)8
Skewness0.7079844392
Sum246612.5
Variance145.0561884
MonotonicityNot monotonic
2021-07-29T22:09:54.971178image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80180
 
6.1%
82104
 
3.5%
8195
 
3.2%
7091
 
3.1%
8488
 
3.0%
8588
 
3.0%
8782
 
2.8%
9081
 
2.7%
7880
 
2.7%
7977
 
2.6%
Other values (130)2000
67.4%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
0.1%
541
 
< 0.1%
552
0.1%
561
 
< 0.1%
573
0.1%
57.53
0.1%
582
0.1%
ValueCountFrequency (%)
1362
 
0.1%
1352
 
0.1%
1331
 
< 0.1%
1321
 
< 0.1%
1305
0.2%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%
1253
0.1%
124.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1200
Distinct (%)40.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.85562036
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:55.084275image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.07
median25.49
Q328.15
95-th percentile32.795
Maximum56.8
Range41.26
Interquartile range (IQR)5.08

Descriptive statistics

Standard deviation4.106459979
Coefficient of variation (CV)0.158822721
Kurtosis2.659488646
Mean25.85562036
Median Absolute Deviation (MAD)2.5
Skewness0.9665097326
Sum76687.77
Variance16.86301356
MonotonicityNot monotonic
2021-07-29T22:09:55.197600image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22.5414
 
0.5%
22.9114
 
0.5%
22.7312
 
0.4%
23.4811
 
0.4%
23.110
 
0.3%
25.2310
 
0.3%
28.310
 
0.3%
21.5110
 
0.3%
25.9410
 
0.3%
22.1910
 
0.3%
Other values (1190)2855
96.3%
ValueCountFrequency (%)
15.541
< 0.1%
16.481
< 0.1%
16.592
0.1%
16.611
< 0.1%
16.691
< 0.1%
16.731
< 0.1%
16.871
< 0.1%
16.921
< 0.1%
17.111
< 0.1%
17.171
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.671
< 0.1%
43.481
< 0.1%
42.151
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct73
Distinct (%)2.5%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean75.80134907
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:55.308308image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile59
Q168
median75
Q382
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)14

Descriptive statistics

Standard deviation12.08435816
Coefficient of variation (CV)0.1594214128
Kurtosis1.123123867
Mean75.80134907
Median Absolute Deviation (MAD)7
Skewness0.6936041159
Sum224751
Variance146.0317121
MonotonicityNot monotonic
2021-07-29T22:09:55.430705image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75409
 
13.8%
80253
 
8.5%
70215
 
7.2%
60163
 
5.5%
72158
 
5.3%
85149
 
5.0%
65128
 
4.3%
90119
 
4.0%
68114
 
3.8%
9565
 
2.2%
Other values (63)1192
40.2%
ValueCountFrequency (%)
441
 
< 0.1%
451
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
484
 
0.1%
5017
0.6%
511
 
< 0.1%
5210
0.3%
536
 
0.2%
5412
0.4%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1301
 
< 0.1%
1252
 
0.1%
1222
 
0.1%
1205
 
0.2%
1152
 
0.1%
1122
 
0.1%
11028
0.9%
1087
 
0.2%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct130
Distinct (%)4.8%
Missing277
Missing (%)9.3%
Infinite0
Infinite (%)0.0%
Mean82.14540721
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-07-29T22:09:55.543885image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q171
median78
Q387
95-th percentile108
Maximum394
Range354
Interquartile range (IQR)16

Descriptive statistics

Standard deviation25.42073162
Coefficient of variation (CV)0.3094601692
Kurtosis59.99597051
Mean82.14540721
Median Absolute Deviation (MAD)8
Skewness6.481499055
Sum220889
Variance646.2135959
MonotonicityNot monotonic
2021-07-29T22:09:55.666152image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75144
 
4.9%
77118
 
4.0%
80114
 
3.8%
73113
 
3.8%
70109
 
3.7%
83109
 
3.7%
78105
 
3.5%
7687
 
2.9%
8586
 
2.9%
7485
 
2.9%
Other values (120)1619
54.6%
(Missing)277
 
9.3%
ValueCountFrequency (%)
401
 
< 0.1%
441
 
< 0.1%
452
 
0.1%
471
 
< 0.1%
481
 
< 0.1%
503
0.1%
521
 
< 0.1%
531
 
< 0.1%
544
0.1%
556
0.2%
ValueCountFrequency (%)
3942
0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2741
< 0.1%

TenYearCHD
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.3 KiB
0
2527 
1
439 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Length

2021-07-29T22:09:56.092572image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-29T22:09:56.146356image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Most occurring characters

ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2966
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Most occurring scripts

ValueCountFrequency (%)
Common2966
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02527
85.2%
1439
 
14.8%

Interactions

2021-07-29T22:09:43.816887image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:43.920450image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.017349image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.111587image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.210579image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.290785image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.370171image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.450546image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.532035image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.616245image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.701600image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.791792image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:44.982233image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.080156image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.169709image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.258300image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.344379image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.438576image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.533452image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.622463image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.714112image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.806340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.898045image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:45.997411image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.086022image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.172130image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.262146image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.355876image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.445592image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.536861image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.628129image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.716690image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.801982image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:46.888541image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.080188image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.176535image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.269090image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.350906image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.441098image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.528616image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.613291image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.693924image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.776418image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.855334image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:47.940943image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.028919image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.112027image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.199833image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.288024image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.374754image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.461289image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.543212image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.623533image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.706222image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.795666image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.875780image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:48.964294image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.052495image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.138882image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.220819image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.300926image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.377297image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.464198image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.676315image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.769120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.857354image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:49.950981image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.044708image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.137133image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.221242image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.303047image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.388599image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.483912image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.574184image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.666921image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.759988image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.851128image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:50.943990image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:51.032936image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:51.119939image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-29T22:09:51.209955image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-07-29T22:09:56.233547image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-29T22:09:56.444906image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-29T22:09:56.615289image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-29T22:09:56.785320image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-29T22:09:56.933591image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-29T22:09:51.434464image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-29T22:09:51.613170image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-07-29T22:09:51.784795image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-07-29T22:09:51.886814image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0sexageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
010462.000.00.0000250.0121.081.028.7395.076.00
121481.0120.00.0000245.0127.580.025.3475.070.00
230613.0130.00.0010225.0150.095.028.5865.0103.01
340463.0123.00.0000285.0130.084.023.1085.085.00
450432.000.00.0010228.0180.0110.030.3077.099.00
570452.0120.00.0000313.0100.071.021.6879.078.00
681521.000.00.0010260.0141.589.026.3676.079.00
7100501.000.00.0000254.0133.076.022.9175.076.00
8110432.000.00.0000247.0131.088.027.6472.061.00
9130413.000.01.0010332.0124.088.031.3165.084.00

Last rows

Unnamed: 0sexageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
295642241472.013.00.0000198.0120.080.025.2375.076.00
295742251454.0143.00.0000216.0137.585.024.2483.0105.00
295842261581.000.00.0000233.0125.584.026.0567.076.01
295942271434.0120.00.0000187.0129.588.025.6280.075.00
296042280501.000.00.0011260.0190.0130.043.6785.0260.00
296142300561.013.00.0010268.0170.0102.022.8957.0NaN0
296242321681.000.00.0010176.0168.097.023.1460.079.01
296342341513.0143.00.0000207.0126.580.019.7165.068.00
296442350482.0120.0NaN000248.0131.072.022.0084.086.00
296542360441.0115.00.0000210.0126.587.019.1686.0NaN0